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ABSTRACT 


The California Phenology Thematic Collections Network (CAP TCN) is a collaborative project that seeks 
to maximize the value of herbarium specimens and their data, especially for understanding changes in plant 
phenology due to anthropogenic climate change. The project unites personnel in herbaria at California 
universities, research stations, natural history museums, and botanic gardens with the goal of capturing 
images, transcribing label data, and producing georeferenced coordinates of nearly one million preserved 
plant specimens collected over the past 150+ years. Each digitized specimen will also be scored for its 
phenological status—the stage of growth and reproduction of the specimen such as flowering or fruiting. The 
CAP TCN is developing efficient workflows and data standards necessary to collect, store, and analyze trait 
data from specimens to ensure their utility for research and other applications. These novel resources and data 
will enable powerful research in phenology and other topics in the California Floristic Province biodiversity 


hotspot and beyond. 


Key Words: ADBC — NSF, California Consortium of Herbaria (CCH), digitization, herbarium, iDigBio, 
natural history collections, phenology, specimens, thematic collections network. 


Dried, pressed plant specimens preserved in 
herbaria have been essential sources of biodiversity 
data for centuries (Lavoie 2013). In the modern 
information era, data from specimens are being used 
for an increasing number of purposes, for example, 
documenting plant distributions in time and space 
and allowing for comparisons of plant morphology, 
biochemistry, and genetic variation among individu- 
als, taxonomic groups, and lineages (Pyke and 
Ehrlich 2010; Lavoie 2013; Thornhill et al. 2017; 
Meineke et al. 2018a; Lang et al. 2018). Herbarium 
specimens have also proven critical in areas of major 
societal concern, such as understanding the effects of 
anthropogenic change due to pollution (Penuelas and 
Filella 2002; Zschau et al. 2003), land-use change 
(Case et al. 2007), and climate change (Calinger et al. 
2013; Wolf et al. 2016; Gonzalez-Orozco et al. 2016), 
among other topics. For example, spatiotemporal 
specimen data have enabled researchers to track the 
spread of economically important invasive species 
(Chauvel et al. 2006), identify changes in the 
distributions of native species (Farnsworth and 
Ogurcak 2006; Case et al. 2007), and establish 
conservation priorities for vulnerable taxa (Kling et 
al. 2018). 

Herbarium digitization—capturing standardized 
label data and high-resolution images of herbarium 
specimens—makes large amounts of high-quality 
data readily and globally available online, which 
accelerates research and facilitates the development 
of new research tools and methods (Elith et al. 2006: 
James et al. 2018; Pearson 2018). Specimen images, in 
- particular, have opened doors to new avenues of 
research, including automated species identification 
of herbarium specimens (Carranza-Rojas et al. 2017) 
and the detection of shifts in plant phenological 
events in response to climate change (Willis et al. 


2017). Digital specimen images may also accelerate 
characterization of other plant features such as 
evidence of herbivory (see Meineke et al. 2018b) or 
disease (see Antonovics et al. 2003). 

Despite its importance for advancing research, 
herbarium specimen digitization remains an enor- 
mous task for the world’s herbaria. In California, 
herbarium digitization has been underway since the 
early 1990s and was accelerated with the establish- 
ment of the Consortium of California Herbaria in 
2003 (CCH; http://ucjeps.berkeley.edu/consortium/ 
about.html), which has since grown to include 2.2 
million specimen records from 40 institutions. Even 
with these efforts, digitization is far from complete in 
the state’s herbaria; hundreds of thousands of 
specimens remain in analog format only, and only 
7% of currently digitized California specimens have 
been imaged as of March 2019, according to the 
national data aggregator iDigBio (idigbio.org). 
While label data alone can be used to address certain 
scientific questions, high-resolution images of her- 
barium specimens are necessary for verifying taxon 
identification and to provide data regarding plant 
traits that can be scored upon visual inspection, 
including plant size, vegetative or floral herbivory, 
evidence of pathogens, morphology, and precise 
reproductive status. 

The California Phenology Thematic Collections 
Network (CAP TCN; https://www.capturingcalifornias 
flowers.org/) was established by a grant from the 
Advancing Digitization of Biodiversity Collections 
(ADBC) program of the United States National Science 
Foundation. The CAP TCN aims to generate nearly 
one million high-resolution images of herbarium 
specimens from 22 California institutions. Each speci- 
men record will consist of an image, transcribed label 
data, georeferenced coordinates when possible, and 
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phenological data. To share these data widely, the CAP 
TCN has established a new web-accessible database 
system (cch2.org) available for use by researchers, land 
managers, educators, and any member of the public. 
Along with specimen collection dates, the reproductive 
status (e.g., the presence or number of unopened or 
open flowers, inflorescences, or mature or dehiscing 
fruits) of all imaged specimens will be scored and 
comprise the basis of a significant phenological dataset 
that can be used to study the effects of climate change 
on the seasonal cycles of plants in California, a 
biodiversity hotspot. 

California has the most diverse native flora of any 
state in the U.S., containing more than one third of 
all U.S. plant species. The state is considered a 
biodiversity hotspot due to its high diversity, high 
number of endemic taxa, and major threats (Raven 
and Axelrod 1978; Myers et al. 2000; Baldwin et al. 
2012, 2017). The state’s flora includes nearly 7,700 
minimum-rank taxa (including species and infraspe- 
cific taxa), of which 6,572 (85%) are native and 2,303 
(30%) are endemic (Jepson Flora Project 2019). This 
diverse and highly endemic flora is also highly 
endangered; the California Native Plant Society 
classifies about one third of the state’s native taxa 
as taxa of special conservation concern (CNPS 2018), 
and nearly 4% of taxa are state or federally listed as 
endangered, threatened, or rare (CDFW 2019). In 
this context, rapid land use changes and anthropo- 
genic climate change pose a heightened threat to the 
California flora. The rising temperatures predicted by 
climate models are already being observed in 
California (Parmesan and Yohe 2003; Kelly and 
Goulden 2008), and this change is impacting the 
state’s plants (Rapacciuolo et al. 2014). Understand- 
ing how plant species, populations, and communities 
change with time and space across the state is critical 
to directing conservation efforts, land management, 
and future scientific inquiry. By producing thorough, 
high quality data records, georeferenced coordinates, 
and images of nearly one million herbarium speci- 
mens, the CAP TCN digitization project will greatly 
advance our understanding of the state’s flora. 


Phenology 


Investigating changes in plant phenology, the 
timing of plant growth and reproduction, is a key 
application for the data produced by the CAP TCN. 
Phenological change is one of the most significant 
and widely recognized effects of climate change 
(Walther et al. 2002; Parmesan and Yohe 2003; 
Calinger et al. 2013; Willis et al. 2017), and such 
change may pose a heightened threat to ‘the 
California flora (Loarie et al. 2008). Numerous 
ecological functions depend on plant phenology at 
multiple levels of biological organization, from 
individuals to ecosystems. Phenology not only affects 
the individual fitness of plants, but also the fitness of 
organisms that depend on plants, such as mutualistic 
pollinators and seed-dispersers or antagonistic her- 
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bivores and parasites (Visser and Both 2005; Both et 
al. 2006). This in turn can affect population-level 
processes such as population growth, mating pat- 
terns, gene flow, and evolution (Franks and Weis 
2009; "Ozgul a 2060;Anderson et al.2012). Før 
example, temporal mismatches between plants and 
pollinators can extirpate local populations of both 
members of a mutualistic pair, cause rapid evolu- 
tionary shifts, and result in billions of dollars of 
agricultural losses (Memmott et al. 2007; McKinney 
et al. 2012; Kudo and Ida 2013; Miller-Struttmann et 
al. 2015). As climate change progresses, phenology- 
dependent interactions between plants and their 
mutualists and antagonists will likely change with 
unknown consequences for biodiversity or agricul- 
tural systems (Encinas-Viso et al. 2012; Matthews 
and Mazer 2016). Understanding changes in plant 
phenology is important not only to improve our 
understanding of—and our ability to forecast— 
ecological change but also to address practical 
environmental problems in both agricultural and 
natural settings. 

Herbarium specimens preserve invaluable pheno- 
logical data for thousands of plants across time and 
space. Although most herbarium specimens are not 
generally collected with the purpose of conducting 
phenological research, the phenological status of a 
specimen can usually be ascertained from reproduc- 
tive structures visible on the sheet. The use of 
herbarium specimens to track the relationship 
between local climatic conditions and the collection 
dates of flowering specimens has a relatively short 
history (Willis et al. 2017). Nevertheless, several 
herbarium-based studies have corroborated the link 
between phenological events and climate change that 
was first observed in long-term, place-based studies 
(Primack et al. 2004; Lavoie and Lachance 2006; 
Davis et al. 2015; Willis et al. 2017), despite known 
geographic, temporal, and taxonomic biases of 
herbarium records (Daru et al. 2017). These studies 
have improved our understanding of narrow- and 
broad-scale phenological shifts among many taxa 
and in many regions across the globe (Willis et al. 
2017). They have also elucidated the specific advan- 
tages of herbarium specimens for phenological 
research, such as filling gaps in long-term or 
observational data sets for a period of time (Meyer 
et al. 2016; Willis et ai. “201'7), underrepresented 
regions (Li et al. 2013), and threatened or rare taxa 
(Robbirt et al. 2011). 

Digitizing California’s herbaria will unlock long- 
term phenological records for nearly a million 
specimens, some of which date back to the late 
1800s. With these data, researchers will be able to 
generate an unprecedented picture of the relationship 
between phenology and climate change in California 
and how this relationship varies with different taxa 
and phenophases (phases within a phenological 
event, such as full flowering or end of flowering). 
The phenological data generated by this project will 
help answer questions such as: (1) To what extent can 
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observed phenological sensitivities to climate condi- 
tions be generalized among, for example, congeneric 
species, confamilial genera, or distinct families? (2) 
Are the flowering times of individual taxa more 
strongly associated with climate normal values (mean 
climate values over 30 years), suggesting that 
flowering time has evolved in response to long-term 
climatic conditions, or with climate conditions in the 
single year or season preceding the collection date? 
(3) For which species, genera, families, vegetation 
types, or functional groups do multivariate pheno- 
climate models best predict flowering times? (4) 
Which habitats and vegetation types are most 
phenologically sensitive to changes in precipitation 
and temperature? (5) Do different functional groups 
(e.g., evergreen versus deciduous taxa, annual herbs, 
geophytes) differ in their responses to long-term 
changes in temperature and precipitation? (6) Are 
rare species more (or less) phenologically sensitive to 
climate than widespread species? (7) Do the pheno- 
logical sensitivities of species occupying highly water- 
limited habitats (e.g., deserts, serpentine outcrops, 
south-facing slopes) differ from those of species 
occupying more mesic habitats? and (8) In systems 
for which we have phenological data on pollinators, 
pathogens, and pests, where might phenological 
mismatches occur between flowering plant species, 
including agricultural plants, and these interacting 
taxa? In addition, with more robust assessments of 


the phenological status of herbarium specimens, 
along with historical and contemporary climatic data 
available online through PRISM (Daly et al. 1994, 
2008) and ClimateNA (Wang et al. 2016), researchers 
will be poised to generate novel predictions concern- 
ing the effects of upcoming climate change on the 
seasonal cycles of individual California plant species 
and the communities that they constitute (cf. Park et 
al. 2019; Park and Mazer 2018, 2019). 

Phenological research—though important in it- 
self—is only the beginning of the many applications 
for digital specimen images, and the potential for 
these specimens and their associated data to enhance 
our understanding of ecology, systematics, evolution, 
biodiversity, and the effects of anthropogenic change 
is rapidly increasing. The CAP TCN empowers this 
research, unites and supports existing data-providers, 
and makes new resources available for the explora- 
tion of natural history collections. 


PARTICIPANTS 


The CAP TCN currently comprises 22 California 
institutions: 11 California State University campuses, 
seven University of California campuses, two botanic 
gardens, one natural history museum, and one 
California Department of Parks and Recreation 
research station (Fig. 1). Over the duration of this 
project, hundreds of undergraduates and members of 
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FIG. 2. Target families to be digitized for the CAP TCN project. Each radial bar indicates the number of specimen records 
of that family that will have phenological data and images in the CAP data portal, CCH2. 


the public will engage with California’s herbaria and 
learn about the importance of natural history 
collections. The project is directed by California 
Polytechnic University, San Luis Obispo (OBI) with 
additional leadership from UC Berkeley (UC/JEPS) 
and UC Santa Barbara (UCSB). Despite ambitious 
digitizing efforts among California herbaria in the 
past, prior to the CAP TCN, only a few herbaria had 
generated images of their specimens (RSA, SD, 
SDSU, SISU, UC/JEPS,.and WCSB). In. the future, 
hopefully every specimen in California herbaria will 
be accompanied by a high-resolution image available 
online. 


PROJECT GOALS 


The sheer number of plant specimens in California 
herbaria precludes imaging all of them in a single 
project. Therefore, to build the most robust dataset 
for detecting phenological shifts, the CAP TCN is 
targeting the oldest records, thesarest diverse 
families, and the families with the most endemic 
and threatened taxa (Fig. 2). These families also 
include many species that represent model systems 
for evolutionary research; the families Asteraceae, 
Brassicaceae, Onagraceae, Phyrmaceae, and Polem- 
oniaceae have been the focus of years of research in 
systematics and evolutionary biology. In addition, 
the CAP TCN is targeting the Adoxaceae, Agava- 


ceae, Sapindaceae, Zygophyllaceae, and an addition- 
al 250 taxa that are currently monitored by the USA 
National Phenology Network (usanpn.org) and the 
California Phenology Project (cpp.usanpn.org) to 
provide robust comparisons between the historical 
record of plant phenology (based on herbarium 
specimens; Fig. 3) and the contemporary record 
(based on in situ observations of living plants; 
Haggerty et al. 2013; Matthews et al. 2014). In total, 
the CAP TCN aims to make available 904,200 
imaged, fully databased, and georeferenced herbar- 
ium specimen records. The project will also produce 
phenological scores (e.g., annotations of unopened 
flowers, open flowers, and fruit) of all 904,200 
specimens according to cooperatively-developed phe- 
nological standards and protocols. 


PROJECT IMPLEMENTATION 


Equipment and Workflows 


Producing such a large number of specimen 
images within four years requires new equipment, 
efficient protocols, training, and, most importantly, 
strong collaboration. Fortunately, the CAP TCN 
operates within the collaborative network of other 
ADBC-funded digitization projects, such as the Mid- 
Atlantic Megalopolis TCN and Southeastern Col- 
laborative Network of Expertise and Collections 
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Numbers of herbarium specimen records in the CAP TCN data portal, CCH2, according to date of collection. The 


approximate onset of recent climate warming is indicated by the black dashed line (Hodgkins et al. 2003). 


(SERNEC), both of which have provided instrumen- 
tal guidance in establishing CAP TCN protocols and 
best practices. Representatives of most CAP TCN 
participating institutions (Fig. 1) attended the ADBC 
Summit in October 2018 to kickstart active commu- 
nication and receive initial training. Institutions have 
purchased and assembled new imaging stations, each 
including a high-resolution camera, lighting equip- 
ment, and necessary software for capturing and 
processing images. CAP leadership is constantly 
developing workflows and protocols for use across 
the network, drawing heavily from existing resources 
and best practices disseminated by iDigBio (idig- 
bio.org), other Thematic Collections Networks, and 
other herbaria. While each herbarium in the CAP 
TCN adapts these workflows and protocols to fit 
their own institution, they all begin with an 
understanding of digitization best practices, and, as 
a result, the images and data meet a specified 
standard of quality. Site-visits, webinars, and con- 
ference calls with the project manager build capacity 
at each herbarium and ensure that project goals are 
met. The developed resources, including training 
manuals, videos, protocols, and quarterly reports, 
are publicly available on the project website: 
capturingcaliforniasflowers.org. A simplified dia- 
gram of the CAP TCN digitization workflow is 
shown in Figure 4. 


Data Portal 


Once specimen images are produced and processed, 
they are made available online through a new 
specimen data portal, CCH2 (cch2.org). The CCH2 
portal contains all specimen data—regardless of taxon 
or collection location—from all collaborating institu- 
tions. This portal greatly expands institutions’ abilities 
to curate and digitize specimens because it leverages 
Symbiota, an open source content management 


system that enables storage, sharing, and active 
curation of specimen data (symbiota.org; Gries et al. 
2014). Symbiota software, and thus the CCH2 portal, 
contains numerous data curation and management 
tools, such as data cleaning and georeferencing 
modules. The CAP TCN is developing new tools to 
capture phenological data from specimen images (see 
Phenological Scoring), and several of these tools can 
be co-opted to capture other trait data (e.g., leaf 
measurements, herbivore damage). 

Previously, many herbaria in the CAP TCN 
lacked interoperable databases that allowed efficient 
curation, cleaning, and sharing of specimen data. 
Now, using a web browser, each herbarium can 
actively manage its own data in the CCH2 portal, 
and data managers, users, and the public can view 
images and data, including phenological data, from 
these collections as soon as information is added. 
Institutions with managers who prefer to manage 
specimen data in a local database can still share their 
data by regularly uploading a “snapshot” of their 
dataset to the portal, which also becomes publicly 
accessible upon upload. Data entered in the CCH2 
portal, either “live” or through a “snapshot” upload, 
are automatically mapped to biodiversity data 
standards provided by Darwin Core (Wieczorek et 
al. 2012b), promoting interoperability of data be- 
tween institutions worldwide. Also because of these 
standards, CCH2 data are easily parsed and distrib- 
uted via global data aggregators iDigBio (idig- 
bio.org) and GBIF (gbif.org). The earlier California 
Consortium of Herbaria public interface, now known 
as CCH1 (http://ucjeps.berkeley.edu/consortium/), 
will also draw from the CCH2 portal and display 
only data from vascular plant specimens collected in 
California, integrating closely with the Jepson eFlora 
(http://ucjeps.berkeley.edu/eflora/). 

The new CCH2 portal will be used to georeference 
all targeted specimens that have not already been 
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FIG. 4. Simplified diagram of the CAP TCN workflow from image capture to data use. 


assigned latitude and longitude coordinates in 
previous projects (e.g., Baldwin et al. 2017). The 
CCH2 portal, like most Symbiota instances, is 
equipped with GEOLocate (Rios and Bart 2010), 
batch georeferencing tools, and botanical duplicate 
identification tools, each of which will facilitate 
efficient georeferencing of herbarium specimen re- 
cords according to existing community-established 
protocols (Fig. 4; Wieczorek et al. 2012a). 


Phenological Scoring 


To capture phenological data, the CAP TCN is 
developing workflows for scoring phenology in a 
number of ways (Fig. 4). Once phenological data 
standards are established and appropriate tools are 
developed in the data portal, some institutions may 
capture phenological traits concurrently with other 
digitization steps (e.g., label transcription). The new 
phenological data fields on the occurrence record 
page of the portal, developed for this project, will 
greatly facilitate this process. The project is also 
expanding the Image Scoring Tool, a Symbiota 
module developed for the New England Vascular 
Plants Thematic Collections Network (nevp.org). 
The Image Scoring Tool presents users with images 
of specimens and allows them to apply a score (e.g., 
“unopen flowers absent, open flowers present, and 
fruit present”) to each record. “Whese seores are 
recorded as annotations to the specimen record 
according to Darwin Core-compatible standards 
(see Yost et al. 2018 for proposed scoring schema), 
which will be fully interoperable with phenological 
data produced by other means via mapping to the 
Plant Phenology Ontology (Stucky et al. 2018). 

The project will also use and build upon another 
newly-developed Symbiota function, the Attribute 
Mining Tool. This tool allows editors to search any 
database field for certain words that refer to 
reproductive states and apply a phenological score 
according to set definitions. For example, an editor 
can search the “Notes” field of the database for the 


word “flower” and all unique text strings containing 
“flower” will be shown. The editor can then select all 
records that are suitable and apply the same 
phenological score to all selected records at once. 
This tool greatly facilitates adding trait attributes to 
records and could be expanded for any number of 
other traits. 

Some phenological scores will be produced via 
crowdsourcing using the citizen science platform 
Notes from Nature (notesfromnature.org; see be- 
low). These scores will represent the consensus scores 
offered by at least three independent volunteer 
scorers, requiring expert review only when results 
are ambiguous. Finally, there are many opportunities 
to explore automated phenological scoring of her- 
barium specimen images. Investigators at several 
collaborating institutions, as well as the CAP TCN 
project manager, are exploring the use of machine 
learning (e.g., neural networks) to automatically 
score the phenology of individual specimens, a 
workflow that has shown much promise (Lorieul et 
al. 2019). All phenological scores will be associated 
with their«specimen records and therefore we 
accessible through the CCH2 portal. 


GET INVOLVED 


Specimen digitization entails a number of activi- 
ties, such as pre-curation, barcoding, photographing, 
and image processing, that require a great deal of 
human participation. Each institution relies on the 
dedication of faculty, staff, paid and unpaid students 
(some receiving research credit for their work), and 
community volunteers to conduct digitization. Many 
institutions are building partnerships with naturalist 
and environmentalist groups, such as local chapters 
of the California Native Plant Society to accelerate 
the rate of digitization and engage the broader 
community. For instance, several institutions are 
crowdsourcing label capture using Notes from 
Nature, an online platform that engages citizen 
scientists to transcribe label data from images into 
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designated text fields (notesfromnature.org). The 
CAP TCN will also collaborate with Notes from 
Nature to create phenology-themed online “expedi- 
tions” in which volunteers use specimen images to 
score specimens’ phenological statuses. 

All interested groups and individuals are invited to 
contribute via online crowdsourcing opportunities 
and to participate in on-site citizen science events 
(e.g., the Worldwide Engagement for Digitizing 
Biodiversity Collections event; wedigbio.org) as they 
become available. Interested parties also have the 
opportunity to participate in one of several phenol- 
ogy workshops that will be conducted at UC Santa 
Barbara, Santa Barbara Botanic Garden, Rancho 
Santa Ana Botanic Garden, and/or UC Berkeley in 
2020-2022. These workshops will expose participants 
to the importance of phenological observations, the 
resources available in the CCH2 portal, and native 
pollinators and native plants, and they will encour- 
age the continuation of current phenological moni- 
toring programs. More information about each of 
these ways to get involved—Notes from Nature 
expeditions, on-site digitization events, and phenol- 
ogy workshops—will be posted on the project 
website (capturingcaliforniasflowers.org). 

The CAP TCN further invites all herbaria that are 
either located in California or can share specimen 
data representing the California Floristic Province to 
collaborate with the CAP TCN. Posting and 
managing data on the CCH2 data portal is an 
efficient and effective way to improve data quality, 
share specimen data, and enhance institution visibil- 
ity, and all are welcome to engage in the CAP TCN 
project. 


CONCLUSION 


Capturing images of herbarium specimens is a 
critical task for mobilizing herbarium specimen data 
for research and education. Specimen images can 
provide a wealth of data that characterize the 
morphology, pathology, and phenology of the plants 
they represent. The California Phenology Thematic 
Collections Network will add nearly one million 
images from the United States’ most diverse floristic 
province to the public sphere. Accompanying these 
images will be full label data, georeferenced coordi- 
nates, and the phenological statuses of the specimens. 
These data will rapidly enable and inspire research on 
California’s changing flora through integration with 
the California Phenology Project, the Plant Phenol- 
ogy Ontology, and data generated through other 
digitization projects. We invite all interested individ- 
uals, groups, and herbarium collections to contribute 
to and benefit from this growing resource. 
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